External indexing and search for a secure cloud collaboration system

ABSTRACT

An end-to-end secure cloud-hosted collaboration service is provided with a hybrid cloud/on-premise index and search capability. This approach includes on-premise indexing and search handling, while relying on the cloud for persistent storage and search of the index. The on-premise indexer receives a copy of an encrypted message from the cloud-hosted collaboration service. The encrypted message has been encrypted with a conversation key. The indexer receives the conversation key from an on-premise key management service, and decrypts the encrypted message with the conversation key. A set of tokens are extracted from the decrypted message, and subsequently encrypted with a secret key, different than the conversation key, to generate a first set of encrypted tokens. The first set of encrypted tokens is transmitted for storage in a search index on the cloud-hosted collaboration service.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.15/151,864, filed May 11, 2016, which is a divisional of U.S. patentapplication Ser. No. 14/225,636, filed Mar. 26, 2014, now issued as U.S.Pat. No. 9,363,243, the entire contents of which are hereby incorporatedby reference.

TECHNICAL FIELD

The present disclosure relates to providing secure indexing and searchcapabilities for cloud based collaboration system.

BACKGROUND

Online collaboration systems allow participants from around the world tocommunicate and share ideas. To enable scalable solutions, acollaboration system may transition away from premise deployedinfrastructure, signaling, and media control to cloud hosted services.However, customers may be hesitant to switch to cloud-hosted servicesdue to perceived loss of control around security and privacy of thecollaboration data. This perception may be exacerbated by the fact thatcollaboration products may carry highly confidential customerinformation in an easily digestible format (e.g., text, voice, video,electronic documents).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a system of devices configured toparticipate in an online collaboration session according to an exampleembodiment.

FIG. 2 is a block diagram of an on-premise search indexer/handleraccording to an example embodiment.

FIG. 3 is a block diagram of client devices setting up ephemerallysecure channels with an on-premise key management service according toan example embodiment.

FIG. 4 is a block diagram of client devices securely receivingconversation keys from the on-premise key management service accordingto an example embodiment.

FIG. 5 is block diagram of client devices participating in acollaboration session according to an example embodiment.

FIG. 6 is a block diagram of an on-premise search indexer operating on acopy of the collaboration session messages to index the collaborationsession for later searching according to an example embodiment.

FIG. 7 is a block diagram of a client device searching the archives ofthe collaboration server according to an example embodiment.

FIG. 8 is a flowchart depicting operations of an indexer processing acopy of an encrypted collaboration session according to an exampleembodiment.

FIG. 9 is a flowchart depicting operations of a cloud hostedcollaboration server facilitating a searchable, encrypted collaborationsession according to an example embodiment.

FIG. 10 is a flowchart depicting operations of a search handlerprocessing a search request according to an example embodiment.

DESCRIPTION OF EXAMPLE EMBODIMENTS Overview

The embodiments presented herein provide for a method comprisingreceiving a copy of an encrypted message from a cloud hostedcollaboration service. The encrypted message has been encrypted with aconversation key. The method further comprises receiving theconversation key from an on-premise key management service, anddecrypting the encrypted message with the conversation key. A set oftokens are extracted from the decrypted message, and subsequentlyencrypted with a secret key, different than the conversation key, togenerate a first set of encrypted tokens. The first set of encryptedtokens is transmitted for storage in a search index on the cloud hostedcollaboration service.

Example Embodiments

With the inherently remote nature of cloud hosted services, customersmay be concerned about the privacy and security of data, such as datagenerated by cloud-based collaboration services. One example of asolution to ensure privacy and security in a cloud-hosted collaborationsystem may be to give customers on-premise control of the cryptographickeys used in establishing secure end-to-end communication sessionbetween client devices. In this way, the customer can maintain controlover the security and privacy of the communications, while allowing thecloud-hosted system to handle the large scale issues of distribution,high availability, message delivery, and archiving. Additionally, thecloud-based service may allow for search indexing of a collaborationsession, also called a conversation hereinafter, maintaining a scalablesearch index encrypted in the cloud.

Referring to FIG. 1, an online conference system 100 is shown thatenables a cloud-hosted collaboration service (CHCS) 110 to facilitate anonline collaboration session (e.g., a web meeting, conversation, etc.)between client devices 120 and 122. Collaboration server 130 is providedto facilitate the conversation, and may comprise a plurality of serversas needed by the CHCS 110. Indexer/Search handler 140 communicates withthe CHCS 110 to provide search indexing and handle search queries asdescribed hereinafter. Key Management Service (KMS) 150 providesauthentication and cryptographic keys to clients 120 and 122, as well asto Indexer 140. To address the customer control of data within thecollaboration sessions, indexer 140 and KMS 150 are within trustboundary 160, and are generally referred to as on-premise assets.On-premise assets such as KMS 150 and indexer 140 may comprise computingdevices that are physically located under the control of the customer.The indexer/search handler 140 and the KMS 150 may be modules of thesame server, or they may be on separate co-located servers.Additionally, the on-premise devices may be located at two differentlocations, as long as both locations and devices are physically underthe control of the customer.

The online conference session may comprise voice, video, chat, desktopsharing, application sharing, and/or other types of data communication.Only two client devices are shown in FIG. 1, but any number of clientdevices may be included in system 100. Client devices 120 and 122 maytake a variety of forms, including a desktop computer, laptop computer,mobile/cellular phone, tablet computer, Internet telephone, etc. CHCS110 may be provided over any type of network (e.g., any combination ofInternet, intranet, local area network (LAN), wide area network (WAN),wired network, wireless network, etc.) that connects computing devices,e.g., client devices 120 and 122, collaboration server 130, indexer 140,and KMS 150. CHCS 110 may be used, for example, to mediate transactionsbetween client devices 120 and 122. CHCS 110 may also perform caching orother time/bandwidth saving techniques. It should be understood that ina web-based conference system, each device may communicate with the CHCS110 through a browser application having one or more plug-ins thatenable a web-based meeting, and allow for the transmission of data tothe collaboration server 130, and the reception of data from thecollaboration server 130 during a conversation.

Referring now to FIG. 2, a simplified block diagram of indexer/searchhandler 140 is shown. Indexer 140 includes a processor 210 to processinstructions relevant to an online collaboration session supported bythe system 100, memory 220 to store a variety of data and softwareinstructions (e.g., audio, video, control data, etc.). The indexer 140also includes a cloud network interface unit (e.g., card) 230 tocommunicate with CHCS 110 and client devices 120 and 122. Indexer 140further comprises a local network interface 240 to communicate securelywithin trust boundary 160, e.g., receiving cryptographic keys from KMS150. The distinction between cloud network interface 230 and localnetwork interface 240 may be purely logical, such that both networkinterfaces 230 and 240 may be part of the same physical networkinterface unit. Indexer 140 also includes indexing module 250 to performthe indexing of conversations and search handling module 260 to handleany search queries about the conversation. Memory 220 may comprise readonly memory (ROM), random access memory (RAM), magnetic disk storagemedia devices, optical storage media devices, flash memory devices,electrical, optical, or other physical/tangible (e.g., non-transitory)memory storage devices. The processor 210 is, for example, amicroprocessor or microcontroller that executes instructions forimplementing the processes described herein. Thus, in general, thememory 220 may comprise one or more tangible (non-transitory) computerreadable storage media (e.g., a memory device) encoded with softwarecomprising computer executable instructions and when the software isexecuted (by the processor 210) it is operable to perform the operationsdescribed herein.

Referring now to FIG. 3, a simplified flow diagram of securingephemerally secure channels is shown. In this exchange of messages,which may be a precursor to a collaboration session, clients 120 and 122set up ephemerally secure communication channels 310 and 312,respectively. In one example, the KMS 150 is responsible for generating,distributing, and maintaining records of all cryptographic keys issuedto any clients within a single trust boundary, such as a corporation. Insome examples, the trust boundary may be pushed out to the serviceprovider level. In another example, the KMS 150 acts as a client to theCHCS 110, so that it is able to receive messages from and send messagesto client devices 120 and 122. The KMS 150 may authenticate itself toclient devices 120 and/or 122 by signing its messages using a publiccertificate that has been issued by a mutually trusted certificateauthority. The client devices 120 and 122 are aware of KMS 150, and maydisplay details of the key management system involved in a conversationto the user, such as the common name (CN) from the KMS's publiccertificate.

In one example, client device 120 will establish a secure communicationchannel with the KMS 150 when it starts up a collaboration session byperforming a signed Diffie-Hellman ephemeral key exchange in whichmessages are relayed trough the CHCS 110. Alternatively, an ephemerallysecure channel may be created for each request in the conversation. TheDiffie-Hellman exchange can be standard or elliptic curve based, or useany other method for securely exchanging a shared secret over aninsecure communication channel. In order to prevent a man-in-the-middleattack, the signing of these exchange messages may include encryptingand signing all client-to-KMS messages with the public key of the KMS150 and encrypting and signing all KMS-to-client messages with theprivate key of the KMS 150. Once the ephemerally secure messagingchannels 310 and 312 are established, clients 120 and 122 may each usetheir respective channel for subsequent requests. Each request mayinclude an authorization token that can be validated by the KMS 150, andwhich is different from any authorization tokens used for communicationswith the CHCS 110 in order to prevent the CHCS 110 from being able toreplay authorization tokens to the KMS 150 and retrieve cryptographickeys.

Referring now to FIG. 4, a simplified flow diagram of client devicessecuring conversation keys for a collaboration session is shown. KMS 150includes a database 410 of cryptographic keys that are used asconversation keys. After establishing ephemerally secure channels asshown in FIG. 3, clients 120 and 122 send requests 420 and 422,respectively, for a conversation key. KMS server 150 retrieves orgenerates a cryptographic key for clients 120 and 122 to use as aconversation key, and sends the conversation key back in messages 430and 432, respectively.

In one example, when client device 120 starts a collaboration sessionwith an invitation to client 122, it notifies CHCS 110 of the invitationthat is going to be sent. The CHCS 110 notifies the KMS 150, whichgenerates a cryptographic key, associates it with thesoon-to-be-established CHCS conversation, stores a copy of the key forsubsequent use along with a list of authorized participants, and relaysthe conversation key through the ephemerally secured channels. Accordingto one example, the conversation key is also associated with aconversation identifier. Once each client has received the conversationkey, it can send symmetrically encrypted messages through the CHCS 110to other conversation participants without concern that the message maybe decrypted by the CHCS 110. Conversely, when client 122 receives theinvitation to join a conversation, it sends a request 430 over itsephemerally secured CHCS channel to the KMS 150, along with itsauthorization token. Assuming the authorization token is valid for therequested conversation (i.e., client device 122 is an authorizedparticipant), then the KMS 150 responds over the ephemerally securechannel with message 432 comprising the conversation key.

If a conversation member is added or removed after a conversation isestablished, the KMS 150 is notified of the change in participants. TheKMS 150 can either add or revoke the authorization of the new or removedmembers in its database and rely on the CHCS to start or stop deliveringmessages to the new or removed members. Alternatively, the KMS 150 maygenerate a new conversation key, store the key and participantauthorizations in its database, and distribute the new conversation keyto the modified list of participants. Generating a new conversation keywhen a participant is removed from a conversation ensures that amalicious participant is unable to collude with the CHCS to continue topassively participate in a conversation after being removed.

Referring now to FIG. 5, a simplified block diagram of a CHCS archivinga conversation is now described. In one example, CHCS 110 includes anarchive 510 for storing messages in any conversation that is facilitatedby collaboration server 130. In this example, client 120 exchangesmessages 520 with collaboration server 130, and client 122 exchangesmessages 522 with collaboration server 130. Through messages 520 and522, clients 120 and 122 participate in a collaboration session.Collaboration server 130 sends copies 530 of the messages from thecollaboration session to be stored in archive 510. In one example, thearchived messages are associated with the corresponding conversationidentifier. Within the CHCS 110, the archive 510 stores encryptedmessages with no access to the data within the encrypted message. If aclient has access to retrieve a message from archive 510, it would alsoneed the appropriate authorizations to access the conversation key fromthe KMS 150 in order to decrypt the archived message.

Referring now to FIG. 6, a simplified block diagram is now described forthe operation of an on-premise indexer. To maximize scalability in oneexample, CHCS 110 maintains a search index 610 in the cloud. To ensureaccess control and privacy, a copy 620 of each message passed throughthe server 130 as part of a conversation is forwarded to indexer 140. Inresponse to receiving message 620, indexer 140 sends a message 630 toKMS 150, through a private internal channel, requesting the conversationkey to decrypt message 620. KMS 150 responds with message 640, whichincludes the corresponding conversation key. In one example, thecorresponding conversation key is identified through the conversationidentifier associated with message 620. Indexer 140 decrypts message620, and extracts searchable tokens from the content. In the case ofdictionary words that are extracted, the indexer 140 may optionally stemthe words to their root form. Once a list of tokens is extracted, theindexer 140 may filter the tokens for a variety of reasons, e.g.,stop-word lists.

In one example, the finalized list of searchable tokens is combined witha relatively large secret key provided by KMS 150, and passed through ahash-based message authentication code (HMAC) function. The resultinglist of HMACs, along with the conversation identifier, is sent to theCHCS as message 650, where it is stored in index 610. The large secretkey provided by KMS 150 functions in a similar manner to a cryptographicsalt, in that it prevents “rainbow” table attacks. However, unlike asalt, the secret is meant to stay secret to prevent dictionary attackson the tokens stored in index 610 of CHCS 110.

Referring now to FIG. 7, a simplified flow diagram is described for theoperation of an on-premise search handler. Client 120 sends a message710 to search handler 140 that includes an encrypted search query.Search handler 140 may then perform the same steps on the search querythat indexer 140 performs when it receives a conversation message. Inother words, search handler 140 decrypts the search query, tokenizes it,optionally stems dictionary words, optionally filters the tokens,requests the large secret key from KMS 150 with message 720, receivesthe secret in message 730, combines the secret with each token, derivesan HMAC for each token and secret combination, and sends the processedsearch query HMACs to the CHCS 110 as message 740. When the CHCS 110receives message 740, it queries search index 610 for matches betweenthe encrypted search query tokens and the encrypted conversation messagetokens. If any tokens match, the corresponding conversation identifieris returned to search handler 140 in message 750. Additionally, the CHCS110 may return the encrypted conversation retrieved from archive 510 aspart of message 750. The search handler 140 then retrieves anyconversation keys of the retrieved search results, combines theconversation key(s) and conversation identifier(s) (or archivedconversation messages), and sends the search results back to client 120as message 760.

In one example, client 120 may set up an ephemerally secure channel withsearch handler 140 in the same manner as described above with respect tothe client 120 retrieving a conversation key. The search handler 140 mayrequire the client 120 to send an authorization token that is differentfrom any used for communications with the CHCS 110 to ensure that CHCS110 is not able to break the encryption through a replay of theauthorization tokens.

Referring now to FIG. 8, an example flowchart of a process 800 forsecurely indexing a cloud hosted collaboration session is now described.In step 810, indexer 140 receives a copy of an encrypted conversationmessage. After receiving the conversation key in step 820, the indexer140 decrypts the conversation message in step 830. In step 840, theindexer extracts searchable tokens (e.g., key words) from the decryptedmessage. Step 840 may include steps of stemming words to their rootwords and/or filtering words, e.g., against a list of stop-words toocommon to be useful in search results. Each of the tokens are encryptedwith a secret key in step 850, and transmitted for storage in acloud-hosted storage index at step 860.

In one example, the extracted tokens are encrypted with a secret keythat is received from the KMS 150. Alternatively, the indexer 140 maymaintain a secret to be used on any conversations that it is assigned toindex and/or search. The secret key may be the same for a plurality ofconversations involving a single trust boundary, such as a singlecorporation. Alternatively, a new secret key may be generated for eachseparate conversation.

Referring now to FIG. 9, an example flowchart of a process 900 forsearching encrypted archives of conversations is now described. In step910, the CHCS 110 facilitates an encrypted collaboration session. Aspart of facilitating the collaboration session, at step 920, the CHCS110 stores an encrypted copy of each of the messages that passes throughthe CHCS 110 as part of the conversation. The CHCS 110 forwards a copyof each of the encrypted messages to search indexer 140 in step 930. Instep 940, the CHCS 110 receives a set of encrypted tokens correspondingfrom the search indexer 140 and stores the encrypted tokens in a searchindex at step 950. The encrypted messages and encrypted tokens may eachbe associated and stored with an unencrypted conversation identifier.

Referring now to FIG. 10, an example flowchart of a process 1000 forhandling search requests is described. In step 1010, the search handlerreceives a search query, and extracts tokens from the search query atstep 1020. The search handler encrypts the search query tokens at step1030 with the same secret key that the search indexer used to encryptthe tokens extracted from the conversation messages. In step 1040, thesearch handler transmits the encrypted search query tokens as a searchquery on a cloud-hosted search index, and receives the matchingconversation identifier(s) as search results in step 1050. The searchresults may also include encrypted copies of the conversation.

In summary, the techniques presented herein provide for hybridcloud/on-premise index and search capability for an end-to-end securecloud hosted collaboration solution. This approach involves on-premiseindexing and search handling, while relying on the cloud for persistentstorage and search of the index. This allows the on-premise computingpower and costs to scale with the volume of searches rather than thevolume of content. The end-to-end encryption maintains theconfidentiality of the conversations, and the encryption is only removedfor search purposes within the on-premise trust boundary.

In one example, the techniques presented herein provide for a methodcomprising receiving a copy of an encrypted message from a cloud hostedcollaboration service. The encrypted message has been encrypted with aconversation key. The method further comprises receiving theconversation key from an on-premise key management service, anddecrypting the encrypted message with the conversation key. A set oftokens are extracted from the decrypted message, and subsequentlyencrypted with a secret key, different than the conversation key, togenerate a first set of encrypted tokens. The first set of encryptedtokens is transmitted for storage in a search index on the cloud hostedcollaboration service.

In another example, the techniques presented herein provide for a searchindexer comprising a cloud network interface configured to receive acopy of a message from a cloud hosted collaboration service encryptedwith a conversation key. The indexer also comprises a local networkinterface configured to receive the conversation key from an on-premisekey management service. The indexer also includes a processor configuredto decrypt the conversation message with the conversation key andextract a set of tokens from the decrypted message. The processor isfurther configured to encrypt the set of tokens with a secret key thatis different than the conversation key to generate a first set ofencrypted tokens, which is then transmitted by the cloud networkinterface for storage in a search index on the cloud hostedcollaboration service.

In a further example, the techniques presented herein provide for amethod for a cloud-hosted collaboration service to provide searchindexing for an end-to-end encrypted collaboration session withoutdecrypting the messages on the cloud. The method comprises facilitatingthe collaboration session comprising a plurality of encrypted messageand storing the plurality of messages in association with a conversationidentifier. A copy of the plurality of messages is transmitted alongwith the conversation identifier to a search indexer. A first set ofencrypted tokens associated with the conversation identifier is receivedfrom the search indexer, and stored in a search index in associationwith the conversation identifier.

The above description is intended by way of example only. Variousmodifications and structural changes may be made therein withoutdeparting from the scope of the concepts described herein and within thescope and range of equivalents of the claims.

What is claimed is:
 1. A method comprising: receiving a search queryfrom a client device; extracting a set of search tokens from the searchquery; receiving a secret key from an on-premise key management service;encrypting the set of search tokens with the secret key to generate afirst set of encrypted tokens; transmitting the first set of encryptedtokens to be matched against a second set of encrypted tokens stored ona cloud-hosted collaboration service; and receiving a conversationidentifier associated with the second set of encrypted tokens thatmatches the first set of encrypted tokens.
 2. The method of claim 1,further comprising: receiving a plurality of encrypted messagesassociated with the conversation identifier; and transmitting theplurality of encrypted messages to the client device as search resultsof the search query.
 3. The method of claim 1, further comprisingtransmitting the conversation identifier to the client device as searchresults of the search query.
 4. The method of claim 1, wherein the firstset of encrypted tokens and the second set of encrypted tokens eachcomprises a set of hash-based message authentication codes (HMACs). 5.The method of claim 1, wherein extracting the set of search tokens fromthe search query further comprises extracting at least one key word fromthe search query and determining roots variations, or plurals of the atleast one key words.
 6. The method of claim 5, further comprisingfiltering the at least one key word against a set of stop words.
 7. Themethod of claim 1, wherein encrypting the set of tokens comprisescombining each of the set of tokens with the secret key to produce a setof combinations, and passing each of the set of combinations through ahash-based message authentication code (HMAC) function to generate thefirst set of encrypted tokens as a set of HMACs.
 8. An apparatuscomprising: a network interface unit configured to: receive a searchquery from a client device; and receive a secret key from an on-premisekey management service; and a processor configured to: extract a set ofsearch tokens from the search query; encrypt the set of search tokenswith the secret key to generate a first set of encrypted tokens; causethe network interface unit to transmit the first set of encrypted tokensto be matched against a second set of encrypted tokens stored on acloud-hosted collaboration service; and receive a conversationidentifier associated with the second set of encrypted tokens thatmatches the first set of encrypted tokens.
 9. The apparatus of claim 8,wherein network interface unit is further configured to: receive aplurality of encrypted messages associated with the conversationidentifier; and transmit the plurality of encrypted messages to theclient device as search results of the search query.
 10. The apparatusof claim 8, wherein the network interface unit is further configured totransmit the conversation identifier to the client device as searchresults of the search query.
 11. The apparatus of claim 8, wherein thefirst set of encrypted tokens and the second set of encrypted tokenseach comprises a set of hash-based message authentication codes (HMACs).12. The apparatus of claim 8, wherein the processor is configured toextract the set of search tokens from the search query by extracting atleast one key word from the search query and determining rootsvariations, or plurals of the at least one key words.
 13. The apparatusof claim 12, wherein the processor is further configured to filter theat least one key word against a set of stop words.
 14. The apparatus ofclaim 8, wherein the processor is configured to encrypt the set oftokens by combining each of the set of tokens with the secret key toproduce a set of combinations, and pass each of the set of combinationsthrough a hash-based message authentication code (HMAC) function togenerate the first set of encrypted tokens as a set of HMACs.
 15. One ormore non-transitory computer readable storage media encoded withsoftware comprising executable instructions and when the software isexecuted operable to cause a processor to: receive a search query from aclient device; extract a set of search tokens from the search query;receive a secret key from an on-premise key management service; encryptthe set of search tokens with the secret key to generate a first set ofencrypted tokens; transmit the first set of encrypted tokens to bematched against a second set of encrypted tokens stored on acloud-hosted collaboration service; and receive a conversationidentifier associated with the second set of encrypted tokens thatmatches the first set of encrypted tokens.
 16. The non-transitorycomputer readable storage media of claim 15, further comprisinginstructions operable to cause the processor to: receive a plurality ofencrypted messages associated with the conversation identifier; andtransmit the plurality of encrypted messages to the client device assearch results of the search query.
 17. The non-transitory computerreadable storage media of claim 15, further comprising instructionsoperable to cause the processor to transmit the conversation identifierto the client device as search results of the search query.
 18. Thenon-transitory computer readable storage media of claim 15, furthercomprising instructions operable to cause the processor to extract theset of search tokens from the search query by extracting at least onekey word from the search query and determining roots variations, orplurals of the at least one key words.
 19. The non-transitory computerreadable storage media of claim 18, further comprising instructionsoperable to cause the processor to filter the at least one key wordagainst a set of stop words.
 20. The non-transitory computer readablestorage media of claim 15, further comprising instructions operable tocause the processor to encrypt the set of tokens by combining each ofthe set of tokens with the secret key to produce a set of combinations,and passing each of the set of combinations through a hash-based messageauthentication code (HMAC) function to generate the first set ofencrypted tokens as a set of HMACs.